information content
Exploiting LLMs for Automatic Hypothesis Assessment via a Based Calibrated Prior
As hypothesis generation becomes increasingly automated, a new bottleneck has emerged: hypothesis assessment. Modern systems can surface thousands of statistical relationships-correlations, trends, causal links-but offer little guidance on which ones are novel, non-trivial, or worthy of expert attention. In this work, we study the complementary problem to hypothesis generation: automatic hypothesis assessment. Specifically, we ask-given a large set of statistical relationships, can we automatically assess which ones are novel and worth further exploration? We focus on correlations as they are a common entry point in exploratory data analysis that often serve as the basis for forming deeper scientific or causal hypotheses.
Improving Evolutionary Multi-View Classification via Eliminating Individual Fitness Bias
Evolutionary multi-view classification (EMVC) methods have gained wide recognition due to their adaptive mechanisms. Fitness evaluation (FE), which aims to calculate the classification performance of each individual in the population and provide reliable performance ranking for subsequent operations, is a core step in such methods. Its accuracy directly determines the correctness of the evolutionary direction. That is, when FE fails to correctly reflect the superiority-inferiority relationship among individuals, it will lead to confusion in individual performance ranking, which in turn misleads the evolutionary direction and results in trapping into local optima. This paper is the first to identify the aforementioned issue in the field of EMVC and call it as fitness evaluation bias (FEB). FEB may be caused by a variety of factors, and this paper approaches the issue from the perspective of view information content: existing methods generally adopt joint training strategies, which restrict the exploration of key information in views with low information content. This makes it difficult for multi-view model (MVM) to achieve optimal performance during convergence, which in turn leads to FE failing to accurately reflect individual performance rankings and ultimately triggering FEB. To address this issue, we propose an evolutionary multi-view classification via eliminating individual fitness bias (EFB-EMVC) method, which alleviates the FEB issue by introducing evolutionary navigators for each MVM, thereby providing more accurate individual ranking. Experimental results fully verify the effectiveness of the proposed method in alleviating the FEB problem, and the EMVC method equipped with this strategy exhibits more superior performance compared with the original EMVC method.
A path to natural language through tokenisation and transformers
Berman, David S., Stapleton, Alexander G.
Natural languages exhibit striking regularities in their statistical structure, including notably the emergence of Zipf's and Heaps' laws. Despite this, it remains broadly unclear how these properties relate to the modern tokenisation schemes used in contemporary transformer models. In this note, we analyse the information content (as measured by the Shannon entropy) of various corpora under the assumption of a Zipfian frequency distribution, and derive a closed-form expression for the slot entropy expectation value. We then empirically investigate how byte--pair encoding (BPE) transforms corpus statistics, showing that recursive applications of BPE drive token frequencies toward a Zipfian power law while inducing a characteristic growth pattern in empirical entropy. Utilizing the ability of transformers to learn context dependent token probability distributions, we train language models on corpora tokenised at varying BPE depths, revealing that the model predictive entropies increasingly agree with Zipf-derived predictions as the BPE depth increases. Attention-based diagnostics further indicate that deeper tokenisation reduces local token dependencies, bringing the empirical distribution closer to the weakly dependent (near IID) regime. Together, these results clarify how BPE acts not only as a compression mechanism but also as a statistical transform that reconstructs key informational properties of natural language.
Dynamic Personalized Federated Learning with Adaptive Differential Privacy
Personalized federated learning with differential privacy has been considered a feasible solution to address non-IID distribution of data and privacy leakage risks. However, current personalized federated learning methods suffer from inflexible personalization and convergence difficulties due to two main factors: 1) Firstly, we observe that the prevailing personalization methods mainly achieve this by personalizing a fixed portion of the model, which lacks flexibility.
Use-Case-Grounded Simulations for Explanation Evaluation
A growing body of research runs human subject evaluations to study whether providing users with explanations of machine learning models can help them with practical real-world use cases. However, running user studies is challenging and costly, and consequently each study typically only evaluates a limited number of different settings, e.g., studies often only evaluate a few arbitrarily selected model explanation methods. To address these challenges and aid user study design, we introduce Simulated Evaluations (SimEvals). SimEvals involve training algorithmic agents that take as input the information content (such as model explanations) that would be presented to the user, to predict answers to the use case of interest. The algorithmic agent's test set accuracy provides a measure of the predictiveness of the information content for the downstream use case. We run a comprehensive evaluation on three real-world use cases (forward simulation, model debugging, and counterfactual reasoning) to demonstrate that SimEvals can effectively identify which explanation methods will help humans for each use case. These results provide evidence that \simevals{} can be used to efficiently screen an important set of user study design decisions, e.g., selecting which explanations should be presented to the user, before running a potentially costly user study.
Architecture-Aware Generalization Bounds for Temporal Networks: Theory and Fair Comparison Methodology
Gahtan, Barak, Bronstein, Alex M.
Deep temporal architectures such as TCNs achieve strong predictive performance on sequential data, yet theoretical understanding of their generalization remains limited. We address this gap through three contributions: introducing an evaluation methodology for temporal models, revealing surprising empirical phenomena about temporal dependence, and the first architecture-aware theoretical framework for dependent sequences. Fair-Comparison Methodology. We introduce evaluation protocols that fix effective sample size $N_{\text{eff}}$ to isolate temporal structure effects from information content. Empirical Findings. Applying this method reveals that under $N_{\text{eff}} = 2000$, strongly dependent sequences ($ρ= 0.8$) exhibit approx' $76\%$ smaller generalization gaps than weakly dependent ones ($ρ= 0.2$), challenging the conventional view that dependence universally impedes learning. However, observed convergence rates ($N_{\text{eff}}^{-1.21}$ to $N_{\text{eff}}^{-0.89}$) significantly exceed theoretical worst-case predictions ($N^{-0.5}$), revealing that temporal architectures exploit problem structure in ways current theory does not capture. Lastly, we develop the first architecture-aware generalization bounds for deep temporal models on exponentially $β$-mixing sequences. By embedding Golowich et al.'s i.i.d. class bound within a novel blocking scheme that partitions $N$ samples into approx' $B \approx N/\log N$ quasi-independent blocks, we establish polynomial sample complexity under convex Lipschitz losses. The framework achieves $\sqrt{D}$ depth scaling alongside the product of layer-wise norms $R = \prod_{\ell=1}^{D} M^{(\ell)}$, avoiding exponential dependence. While these bounds are conservative, they prove learnability and identify architectural scaling laws, providing worst-case baselines that highlight where future theory must improve.